Preserving sequence annotations across reference sequences

نویسندگان

  • Zuotian Tatum
  • Marco Roos
  • Andrew P. Gibson
  • Peter E. M. Taschner
  • Mark Thompson
  • Erik Schultes
  • Jeroen F. J. Laros
چکیده

BACKGROUND Matching and comparing sequence annotations of different reference sequences is vital to genomics research, yet many annotation formats do not specify the reference sequence types or versions used. This makes the integration of annotations from different sources difficult and error prone. RESULTS As part of our effort to create linked data for interoperable sequence annotations, we present an RDF data model for sequence annotation using the ontological framework established by the OBO Foundry ontologies and the Basic Formal Ontology (BFO). We defined reference sequences as the common domain of integration for sequence annotations, and identified three semantic relationships between sequence annotations. In doing so, we created the Reference Sequence Annotation to compensate for gaps in the SO and in its mapping to BFO, particularly for annotations that refer to versions of consensus reference sequences. Moreover, we present three integration models for sequence annotations using different reference assemblies. CONCLUSIONS We demonstrated a working example of a sequence annotation instance, and how this instance can be linked to other annotations on different reference sequences. Sequence annotations in this format are semantically rich and can be integrated easily with different assemblies. We also identify other challenges of modeling reference sequences with the BFO.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Phylogenetic-based propagation of functional annotations within the Gene Ontology consortium

The goal of the Gene Ontology (GO) project is to provide a uniform way to describe the functions of gene products from organisms across all kingdoms of life and thereby enable analysis of genomic data. Protein annotations are either based on experiments or predicted from protein sequences. Since most sequences have not been experimentally characterized, most available annotations need to be bas...

متن کامل

Listeria Monocytogenes La111 and Klebsiella Pneumoniae KCTC 2242: Shine-Dalgarno Sequences

Listeria monocytogenes can cause serious infection and recently, relapse of listeriosis has been reported in leukemia and colorectal cancer, and the patients with Klebsiella pneumoniae are at increased risk of colorectal cancer. Translation initiation codon recognition is basically mediated by Shine-Dalgarno (SD) and the anti-SD sequences at the small ribosomal RNA (ssu rRNA). In this research,...

متن کامل

Compression of large DNA databases

The thesis explores algorithms to efficiently store and access repetitive DNA sequence collections produced by large-scale genome sequencing projects. First, existing general-purpose and DNA compression algorithms are evaluated for their suitability for compressing large collections of DNA sequences. Then two novel algorithms for compressing large collections of DNA sequences are introduced. Th...

متن کامل

Cross-Corpus Evaluation of Word Alignment

We present the procedures we implemented to carry out system oriented evaluation of a syntax-based word aligner —ALIBI. We take the approach of regarding cross-corpus evaluation as part of system oriented evaluation assuming that corpus type may impact alignment performance. We test our system on three English–French parallel corpora. The evaluation procedures include the creation of a referenc...

متن کامل

TreeGrafter: phylogenetic tree-based annotation of proteins with Gene Ontology terms and other annotations

Summary: TreeGrafter is a new software tool for annotating protein sequences using annotated phylogenetic trees. Currently, the tool provides annotations to Gene Ontology terms, and PANTHER protein class, family and subfamily. The approach is generalizable to any annotations that have been made to internal nodes of a reference phylogenetic tree. TreeGrafter takes each input query protein sequen...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره 5  شماره 

صفحات  -

تاریخ انتشار 2014